Major Data Management platforms

FAIR - Findable, Accessible, Interoperable, Reusable

The Data Management platforms listed should respect the FAIR principle to some extend. The details follow this idea: if a platform offers an extensive API but it is very difficult to find a documentation, then it is not findable or accessible, thus should be considered as such. On the other hand, a platform based on common standards such as RDF or Apache Spark might be interoperable and reusable without needing to document RDF or Apache Spark as their documentation are easily available.

Contributions are welcome

Feel free to contribute by forking the repository or creating an issue: GitHub sources. We filled the information by the best of our knowlegde and after a reasonable amount of research, but that does not prevent us from doing mistakes. Once again we will be glad for any correction.

  • Ask for a change: create an issue,
  • Add or change something: fork, then do a Pull Request.

The site has been written using Quarto and the main syntax is PanDoc, which mostly like Markdown, and is well documented on the Quarto web site. The pages are in pages, with the name corresponding to the title. Each entry is on the same model, so a copy-past is the best way to add a new one. Quarto has extensions for several editors, so a preview is often possible from within the editor. Visual Studio Code has a good support.

Typical entry:

* [Name, quick description](https:URL of main web page)
    + Description and/or link to potential Docker image/Docker compose/Kubernetes manifest/…
    + Link to API(s), quick clarification on how well documented it is.
    + Interoperability **NONE/LOW/MEDIUM/HIGH** / No interoperability: explanation on why.
    + **Not/Partly/Mostly/Fully** cross-domain: explanation on why.

Navigational elements are configured in _quarto.yml

Progress

  • TODO: Eventually a clear view of the adaptability to cloud architectures.

Process

Top-Down approach: first listing with technical information (online, containers, …), then API support. Finally interoperability (data format of API), possibility of transfer to other domains and adaptability to cloud. Major platforms are easy-to-find and popular platforms. Draft should be in to_sort or to_investigate pages.

The format is keep as simple as possible so it is easier to modify. We find it more important to encourage contributions as to keep the list up-to-date. This is why we don’t use tables for instance.

To Work-On

  • Write about non-software dedicated DM approach (for instance semantic-web, RDF or Apache Spark).

  • Better presentation, without making editing more complex

Life Science

Open Source or free

Commercials

Pharmaceutics

Chemistry & Chemical Sciences

Materials science

Geomatics

Open Source

Commercials

Urbanism

Open Source or free

Commercials

Digital Lab Notebooks

  • OpenBis
    • Docker image
    • Extensive APIs, well documented
    • Interoperability HIGH: while there is no REST APIs, the other APIs are well documented and the Python API is simple to use even for non-developers.
    • Fully cross-domain: OpenBis is highly configurable so can be use for any domain as a notebook but can also be used for other needs if properly configured. Then it is not the easiest platform to set-up and configure.
  • eLabFTW
    • Docker
    • Extensive APIs, well documented
    • Interoperability HIGH: the REST API uses well-documented JSON and the Python API offers plenty of examples.
    • Fully cross-domain: eLabFTW is highly configurable so can be use for any domain as a notebook but can also be used for other needs if properly configured.
  • RSpace, electronic noteboook and inventory management
    • Open Source and commercial solutions.
    • Docker Compose
    • Extensive API and SDK, well documented.
    • Interoperability HIGH: uses well-documented JSON.
    • Fully cross-domain: eLabFTW is highly configurable so can be use for any domain as a notebook but can also be used for other needs if properly configured.

Astronomy

Astronomy is largely standardized and relies on many tools developed over several decades, and online catalogues. As such many are standalone tools and are potentially cloud-friendly, though will need some more technical adaptations. Most platforms are not cross-domain are they give data from astronomical observations. But as such, if there is an actual need for such data in another domain, the technical effort for the integration will probably be worth it. The technical effort changes greatly between platform and can be rather high with many specialised tools and specialised vocabulary.

Humanities

Bibliography

Generalists

All are inherently cross-platform and most follow the FAIR principle.

Authority control

Others

  • NextCloud
    • NextCloud is a file hosting service, similar to Google Drive or Microsoft OneDrive, but OpenSource that can be self-hosted. It is a free file hosting platform, it does not constraint which data are stored and do not follow complex metadata or formats. It can be extended and adapted using plugins, such as a plugin to integrate an office suite, such as LibreOffice. As such, it should be considered a support application before all: a tool to support your work, or a way to store data for other applications. The absence of constraint makes it a poor choice for FAIR data storage, as the users would need to do most of the needed work. Still, interconnected via API to other Software, it could be an essential part of a Data Management solution, even if only as a scrapbook. NextCloud has an extensive API, so could be used with constraint via the API using a dedicated Pipeline. It could be with connection with another application or for a specific use-case.
    • Docker, Docker compose and Kubernetes installation, with Helm chart
    • Extensive API
    • Interoperability HIGH: The API is well documented and there is several ways of doing things. Then NextCloud is highly flexible, so the use will need to be defined clearly and there might need to restrict it using the API for uploads in order to integrate it in an assembly.
    • Fully cross-domain